An outbreak of Ebola occurred in the Equator province of the Democratic Republic of Congo in April 2018. The WHO published its first Disease Outbreak News on the outbreak on May 10th and began efforts to conduct ring vaccination around the area. You can read more about it on the WHO’s website on the 2018 Ebola outbreak in the DRC.

Since the start of the outbreak, Caitlin Rivers, an Assistant Professor at the John Hopkins Center for Health Security has been digitizing the WHO Disease Outbreak Network situations reports and the DRC Ministry of Health mailing list reports and posting the data on her github repository.

I’ve put together some visualizations to explore the data here. This is more of an exploration of the data than a tutorial in how to do so, but all of the code is provided if you want to take a shot at it yourself.

Last Updated: June 9, 2018

Packages you’ll need

library(tidyr)
library(sp)
library(leaflet)
library(rgdal)
library(RCurl)
library(ggplot2)
library(dplyr)
library(plotly)

Accessing Github Data

All of the data is in a public github repository that can be accessed within R, making it easy to update by simply rendering the document. You need the RCurl package to read the csv file. To read in a github csv file you need click on the “Raw” button to get the url of the raw text file.

who.data <- read.csv(text = getURL( "https://raw.githubusercontent.com/cmrivers/ebola_drc/master/who/data.csv"), stringsAsFactors = F, header = T)

who.data$source <- "WHO"

drc.data <- read.csv(text = getURL("https://raw.githubusercontent.com/cmrivers/ebola_drc/master/drc/data.csv"), stringsAsFactors = F, header = T)
drc.data$source <- "MOH"

According to the WHO website, this includes the confirmed, probable, and suspected cases. Deaths are split with an additional category for health care workers. As of 2018-05-14, the data was being reported as cumulative, although the MOH reports also include a category for new cases, and deaths may not be cumulative.

head(who.data)
##   event_date report_date province health_zone confirmed_cases confirmed_deaths
## 1   20180101    20180511 Equateur      Bikoro              NA               NA
## 2   20180505    20180511 Equateur      Bikoro              NA               NA
## 3   20180507    20180511 Equateur      Bikoro               2               NA
## 4   20180509    20180510 Equateur      Bikoro               2               NA
## 5   20180511    20180511 Equateur      Bikoro               2               NA
## 6   20180511    20180514 Equateur     Wangata              NA               NA
##   probable_cases probable_deaths suspect_cases suspect_deaths cumulative_hcw
## 1             NA              NA            21             17             NA
## 2             NA              NA            26             NA             NA
## 3             NA              NA            24             NA             NA
## 4             16              NA            12             18              3
## 5             18              NA            14             18              3
## 6              2              NA            NA             NA             NA
##   contacts_traced source
## 1              NA    WHO
## 2              NA    WHO
## 3              NA    WHO
## 4              NA    WHO
## 5              75    WHO
## 6              43    WHO
head(drc.data)
##   event_date report_date health_zone confirmed_cases new_confirmed
## 1   20180501    20180510      Bikoro              NA            NA
## 2   20180510    20180510      Bikoro               2            NA
## 3   20180511    20180511      Bikoro               2            NA
## 4   20180511    20180511       Iboko               0            NA
## 5   20180512    20180512      Bikoro               2            NA
## 6   20180512    20180512       Iboko               0            NA
##   probable_cases new_probable suspect_cases new_suspect deaths source
## 1             NA           NA            21          NA     17    MOH
## 2             NA           NA             9          NA      1    MOH
## 3             NA           NA             6          NA      0    MOH
## 4             NA           NA             6          NA      1    MOH
## 5             NA           NA             6          NA      0    MOH
## 6             NA           NA             6          NA      1    MOH

As of June 9 2018, the Ministry of Health Data seems to be more current (goes until 2018-06-06) so I’m just going to focus on that.

ebola.data <- drc.data %>%
  mutate(Date = as.Date(as.character(event_date), format = "%Y%m%d")) %>%
  #selet columns of interest
  select(-starts_with("new")) %>%
  gather(type, number, confirmed_cases:deaths) %>%
  select(Date, health_zone, type, number)

Epidemic Curve

The epidemic curve shows the number of cases per reporting period. Here, I plot the cumulative cases, which is how the CDC and MOH report them.

p <- ebola.data %>%
  dplyr::select(-health_zone) %>%
  dplyr::filter(type != "deaths") %>%
  #shrink labels by wrapping
  mutate(labels = gsub("_", "\n",type)) %>%
    #order labels
  mutate(labels = factor(labels, levels = c("suspect\ncases", "probable\ncases",
                                            "confirmed\ncases"))) %>%
  ggplot(., aes(x=Date, y = number)) +
  geom_bar(stat="identity", aes(fill = labels), color = NA) +
  theme_minimal() +
  ylab("No. of cases")+
  scale_fill_brewer(palette = "Set2", type = "qual", name = "Reporting Type")
  
ggplotly(p) %>%
  layout(margin = list(r=100))
## Warning: Removed 7 rows containing missing values (position_stack).

As of June 9 2018, the growth in cases has slowed and the slope is almost zero, suggesting the epidemic has burned out.

Spatial Distribution of Suspected Cases

The cases are reported by health zone. You can download a shapefile or GeoJSON of the DRC’s health zones here.

When this outbreak started, most digital data of the healthzones of the DRC was wrong (including my own!). You can read about how a group of Congolese cartographers worked to update and correct these maps in this Atlantic article by Ed Yong. The WHO and CDC have since updated its maps, but (as of June 9) I haven’t been able to find the correct shapefile freely available online.

health.zones <- readOGR("../../static/data/drc-ebola", "healthZones2", stringsAsFactors = F)
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/mvevans/Dropbox/git/mevansblog/static/data/drc-ebola", layer: "healthZones2"
## with 515 features
## It has 2 fields
moh.zone <- ebola.data %>%
  group_by(health_zone) %>%
  #get lastest reporting date
  filter(Date == max(Date)) %>%
  #spread data to work with leaflet
  spread(type, number) %>%
  #calculate total cases
  mutate(total_cases = sum(suspect_cases, probable_cases, confirmed_cases)) %>%
  select(Name = health_zone, suspect_cases, probable_cases, confirmed_cases, total_cases, deaths)
moh.map <- merge(health.zones, moh.zone, by = "Name", all.x = T) 
#fix NAs
moh.map@data[is.na(moh.map@data)] <- 0
#create color palette
pal <- colorNumeric("YlOrRd", domain = moh.map$total_cases)

labels.moh <-  sprintf(
  "<strong>%s</strong><br/>%g Total Cases, %g Deaths (DRC MOH)",
  moh.map$Name, moh.map$total_cases, moh.map$deaths
) %>% lapply(htmltools::HTML)
leaflet(moh.map) %>%
    # basemap
    addProviderTiles(providers$OpenStreetMap.HOT) %>%
    addPolygons(color = "gray08", 
                fillOpacity = 0.5, 
                fillColor = ~pal(total_cases),
                label = labels.moh,
                highlightOptions = highlightOptions(color = "#444444", weight = 2,
                                                  bringToFront = TRUE)) %>%
    # add legend
    addLegend(pal = pal, values = ~total_cases, opacity = 0.7, title = "Total\nCases",
              position = "bottomright") %>%
    # set custom zoom
    setView(lng = 17.1, lat = -1.3, zoom = 7)

The majority of cases were not in densely populated areas, although there was a confirmed case in Mbandaka (Wangata Health Zone) in late May (WHO Report).

I will try and keep this updated as data comes in. Check the Last Updated tag at the top of this post to see when the data was last updated.